1 Objective:

  • Remove data on landowners who own a portion of his farmland in both inside and outside of the buffer of LR side.
  • Aggregate the data to make it to annual water usage data by landowner and conduct CF-analysis.

1.1 Data

# === setup === #
library(here)
source(here("GitControlled/Codes/0_1_ls_packages.R"))
source(here("GitControlled/Codes/0_0_functions.R"))

#/*--------------------------------*/
#' ## Data
#/*--------------------------------*/
# === NRD boundary === #
nrd_boud <- 
 here("Shared/Data/WaterAnalysis/NRD_bd/BND_NaturalResourceDistricts_DNR.shp") %>%
 st_read() %>%
 filter(NRD_Name %in% c("Lower Republican", "Tri-Basin")) %>%
 st_transform(4269)
Reading layer `BND_NaturalResourceDistricts_DNR' from data source 
  `/Users/shunkeikakimoto/Dropbox/ResearchProject/HeterogeneousAllocation/Shared/Data/WaterAnalysis/NRD_bd/BND_NaturalResourceDistricts_DNR.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 23 features and 6 fields
Geometry type: POLYGON
Dimension:     XY
Bounding box:  xmin: -11583170 ymin: 4865934 xmax: -10609670 ymax: 5312233
Projected CRS: WGS 84 / Pseudo-Mercator
# === well are located in east side of US Highway 183?  === #
east_or_west <- 
  here("Shared/Data/WaterAnalysis/east_or_west.rds") %>%
  readRDS()


# === Data period: 2008-2015  === #
reg_data_all <- 
  here("Shared/Data/WaterAnalysis/comp_reg_dt.rds") %>%
  readRDS() %>%
  .[,owner_name := paste0(firstname, "_", lastname)] %>%
  .[year %in% 2008:2015 & usage <= 45] %>%
  east_or_west[., on = "wellid"]


# === Data period: 2008-2012 === #
reg_data <- 
  reg_data_all %>%
  .[year %in% 2008:2012]

2 Identify landowners whose farmland locates in both inside and outside of the five-mile buffer

2.1 Identify landowners in LR who have farmlands outside of the buffer

# === All Well Data === #
ir_data <- 
  here('Shared/Data/ir_reg.rds') %>%
  readRDS() %>%
  data.table() %>%
  .[source %in% c('Meter','METER','metered') & nrdname %in% c("Lower Republican", "Tri-Basin")] %>%
  .[,owner_name := paste0(firstname, "_", lastname)]
  # .[year %in% 2008:2012]

# --- check: Low_Tri_5mi --- #
ggplot() + 
  geom_sf(data=nrd_boud) +
  geom_sf(data=st_as_sf(ir_data, coords = c("longdd","latdd"), crs = 4269), 
    aes(color=factor(Low_Tri_5mi)), size=0.5)

  • NOTE:
    • Remember that some farmers own farmlands in both NRDs. We regard those farmlands as if they are owned by two different owners in each NRD.
      • Total number of landowners in the data (before we exclude some farmers from data): 770:
        • only in LR: 325
        • only in TB: 383
        • Both: 62


  • We want to remove only the fields related to farmers who also have farmland outside of the buffer in LR side. If we simply use owners’ name to filter the data, we might also eliminate the farmland that owner owened in TB. So a special care is required to distinguish those owners who owns multiple fields in both LR and TB.
    • We use landowner name associated with NRD name (ex. "Lower Republican ownerA")to filter the data.


  • The below creates a list of landowners who own at least one of their farmlands outside of the buffer in LR.
outside_5mi_LR <- 
  ir_data %>%
  .[,nrd_owner_name := paste0(nrdname, " ", owner_name) ] %>%
  .[nrdname=="Lower Republican" & Low_Tri_5mi==0, nrd_owner_name] %>%
  unique()


  • Then, in the regression data(2008-2012), create a column of binary index both_InOut_5mi_LR showing whether a landowner has farmlands in both inside and outside of the buffer facing LR so that we can distinguish them later.


# create "both_InOut_5mi" index
reg_data <- 
  reg_data %>% 
  .[,nrd_owner_name := paste0(nrdname, " ", owner_name) ] %>%
  .[, both_InOut_5mi_LR := ifelse(nrd_owner_name %in% outside_5mi_LR, 1, 0)]


  • After distinguishing owners who owns multiple fields in both LR and TB, the total number of farmers included in the data turned out to be: 832:
    • LR: 387
    • TB: 445


reg_data %>%
  unique(., by="nrd_owner_name") %>%
  .[, .N, by=.(nrdname, both_InOut_5mi_LR) ] %>%
  .[order(nrdname)] %>%
  print()
            nrdname both_InOut_5mi_LR     N
             <char>             <num> <int>
1: Lower Republican                 0   306
2: Lower Republican                 1    81
3:        Tri-Basin                 0   445
  • The above table shows the number of farmers for each NRD. both_InOut_5mi_LR=1 indicates the farmers who also owns farmland outside of the 5-mile buffer of LR side, and the number fo those farmers is 81.
    • Data related to these 81 farmers is removed from the regression data.
    • The total number of farmers included in the data is 306+445=751.


3 New Data (Annual water-use data by owner)

3.1 Flow

  • Use the new criteria both_InOut_5mi_LR to filter the data.
  • For LR data, aggregate the annual data by field to annual data by owner.
  • For TB data, field-level annual data is used.
mod_reg_data <- 
  reg_data[both_InOut_5mi_LR==0,]

# - TB - #
reg_data_TB <- 
  mod_reg_data[nrdname=="Tri-Basin",]

# - LR - #
reg_data_LR <- 
  mod_reg_data[nrdname=="Lower Republican",]

3.2 Problem

  • Which variable should be used for clustering?
    • I used to use tr (combinations of “township” and “range” index).
  • The value of tr is not one, if a farmer has farmland across several “range” .
# Ideally, a single farmers have only one single "tr". Check how many of farmers whose farmland is scattered across multiple "tr"

tr_cl <- 
  reg_data_LR %>%
  distinct(nrd_owner_name, tr) %>%
  .[, index := 1,] %>%
  dcast(.,nrd_owner_name ~ tr, value.var = "index") %>%
  replace(is.na(.), 0) %>%
  .[, total := rowSums(.SD), by = nrd_owner_name]

# tr_cl[total>1, ]
  • 21 farmers in LR have farmlands across several tr segments.


  • We need some new geographic segmentation index which covers larger area.
    • Alternatives:
      • County (This time, I used county.)
      • ?


county_cl <- 
  reg_data_LR %>%
  distinct(nrd_owner_name, county) %>%
  .[, index := 1,] %>%
  dcast(.,nrd_owner_name ~ county, value.var = "index") %>%
  replace(is.na(.), 0) %>%
  .[, total := rowSums(.SD), by = nrd_owner_name]

# county_cl[total>1, nrd_owner_name]
county_cl[total>1, ] %>% print()
Key: <nrd_owner_name>
                            nrd_owner_name FRANKLIN FURNAS HARLAN KEARNEY total
                                    <char>    <num>  <num>  <num>   <num> <num>
1:            Lower Republican David_Black        1      0      1       0     2
2: Lower Republican John & Ingrid_Tangeman        1      1      0       0     2
3:  Lower Republican Wayne & Linda_Brummer        1      0      1       0     2
4:  Lower Republican _Franklin County Farm        1      0      0       1     2
  • Still there are four farmers who have farmland across several counties.
    • The above table shows whether a farmer has farmland in each of those counties. If farmland located in a county, 1, otherwise 0.
ne_county_sf <-  tigris::counties("Nebraska", cb = TRUE, resolution="20m", progress_bar = FALSE) %>%
  dplyr::select(COUNTYFP, NAME) %>%
  filter(NAME %in% str_to_sentence(unique(mod_reg_data$county)))%>%
  st_transform(4269)

county_cl_sf <-
  reg_data_LR %>%
  .[nrd_owner_name %in% county_cl[total>1, nrd_owner_name],] %>%
  distinct(nrd_owner_name, longdd, latdd) %>%
  st_as_sf(., coords = c("longdd","latdd"), crs = 4269)

ggplot()+
  geom_sf(data=ne_county_sf, aes(fill=NAME), alpha=0.6) +
  geom_sf(data=nrd_boud, color="blue", fill=NA) +  
  geom_sf(data=county_cl_sf, size=1)+
  facet_wrap(~nrd_owner_name, ncol=2)

# reg_data_LR %>%
#   .[nrd_owner_name == "Lower Republican John & Ingrid_Tangeman"] %>%
#   distinct(nrd_owner_name, longdd, latdd, .keep_all = TRUE)

# reg_data_LR %>%
#   .[nrd_owner_name == "Lower Republican David_Black"] %>%
#   distinct(nrd_owner_name, longdd, latdd, .keep_all = TRUE)
  • Each panel in the figure above shows how farmland of each of those four farmers is distributed.

    • Interestingly, each of two farm belonging to “Franklin County Farm” locates in TB and LR respectively although they were subjected to the groundwater allocation limit on data.


    • Each of the farmland of “John & Ingrid_Tangeman” and “David_Black” are far apart, respectively. (Cannot be helped)
      • “John & Ingrid_Tangeman” farmland: Furnas county
        • because the irrigated acres in the farmland in Furnas county is much lager the other one
      • “David_Black” farmland: Harlan county
        • because two out of the three farmland is located in Harlan county


    • Regarding the farmland of “Franklin County Farm” and ” Wayne & Linda_Brummer”, it looks okay to regard them as being located in the same county.
      • “Franklin County Farm” farmland : Franklin county
      • “Wayne & Linda_Brummer” farmland : Harlan county
# reg_data_LR$county %>% unique()

reg_data_LR2 <- 
  reg_data_LR %>%
  .[,county_fix := case_when(
    # - first group - #
    nrd_owner_name == "Lower Republican John & Ingrid_Tangeman" ~ "FURNAS",
    nrd_owner_name == "Lower Republican David_Black" ~ "HARLAN",
    # - second group  - #
    nrd_owner_name == "Lower Republican _Franklin County Farm" ~ "FRANKLIN",
    nrd_owner_name == "Lower Republican Wayne & Linda_Brummer" ~ "HARLAN",
    TRUE ~ county
  )]

# county_cl2 <- 
#   reg_data_LR2 %>%
#   distinct(nrd_owner_name, county_fix) %>%
#   .[, index := 1,] %>%
#   dcast(.,nrd_owner_name ~ county_fix, value.var = "index") %>%
#   replace(is.na(.), 0) %>%
#   .[, total := rowSums(.SD), by = nrd_owner_name]

# county_cl2[total>1, ]
  • Now, each farmers in LR have farmland in a single county

3.3 Data aggregation by taking mean values

reg_data_LR_mean_final <- 
  reg_data_LR2 %>%
    .[,.(
    usage = mean(usage),
    treat2 = mean(treat2),
    # --- soil --- #
    silttotal_r = mean(silttotal_r),
    claytotal_r = mean(claytotal_r),
    slope_r = mean(slope_r),
    ksat_r = mean(ksat_r),
    awc_r = mean(awc_r),
    # --- weather --- #    
    pr_in = mean(pr_in),
    pet_in = mean(pet_in),
    gdd_in = mean(gdd_in),
    # --- tr --- #
    county = unique(county_fix)
    ), by = .(nrd_owner_name, year)] %>%
    .[,county_year := paste0(county, "_", year)]

reg_data_TB_final <-
  reg_data_TB %>%
  .[, names(reg_data_LR_mean_final) %>% .[-length(.)], with=FALSE] %>%
  .[,county_year := paste0(county, "_", year)]


reg_data_final <- rbind(reg_data_LR_mean_final, reg_data_TB_final)

4 CF analysis

4.1 No clustering

4.2 Clustering with county_year

5 Next step

  • The usge is the average of water use (acre-inch).
  • Then, I took a simple average of usge by owner
  • So, I’m taking average of average. This could be a problematic if the sizes of farmland owned by the same owner are different because it means the both fields gets the same wight
  • For example, owner A has two fields:
    • Field 1: area 200 acre, water use 140, so 0.7 acre inch usage
    • Field 2: area 100 acre, water use 50, so 0.2 acre inch usage
    • Simple average of usage is \(\frac{0.7+0.2}{2}=0.45\)
    • Using a weighted average, \(\frac{200}{200+100}\times0.7 + \frac{100}{200+100}\times0.5=\frac{140+50}{300}=0.63 > 0.45\)


+ So I needed to calculate with weighted average. (weighted by acres). + We have two i.acres and acres + Need to have correct data. Can I reproduce the data from scratch?